Authorship Attribution with Author-aware Topic Models

نویسندگان

  • Yanir Seroussi
  • Fabian Bohnert
  • Ingrid Zukerman
چکیده

Authorship attribution deals with identifying the authors of anonymous texts. Building on our earlier finding that the Latent Dirichlet Allocation (LDA) topic model can be used to improve authorship attribution accuracy, we show that employing a previously-suggested Author-Topic (AT) model outperforms LDA when applied to scenarios with many authors. In addition, we define a model that combines LDA and AT by representing authors and documents over two disjoint topic sets, and show that our model outperforms LDA, AT and support vector machines on datasets with many authors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Authorship Attribution with Topic Models

Authorship attribution deals with identifying the authors of anonymous texts. Traditionally, research in this field has focused on formal texts, such as essays and novels, but recently more attention has been given to texts generated by on-line users, such as e-mails and blogs. Authorship attribution of such on-line texts is a more challenging task than traditional authorship attribution, becau...

متن کامل

Investigating Topic Influence in Authorship Attribution

The aim of this paper is to explore text topic influence in authorship attribution. Specifically, we test the widely accepted belief that stylometric variables commonly used in authorship attribution are topic-neutral and can be used in multi-topic corpora. In order to investigate this hypothesis, we created a special corpus, which was controlled for topic and author simultaneously. The corpus ...

متن کامل

Author Attribution Evaluation with Novel Topic Cross-validation

The practice of using statistical models in predicting authorship (so-called author attribution models) is long established. Several recent authorship attribution studies have indicated that topic-specific cues impact author attribution machine learning models. The arrival of new topics should be anticipated rather than ignored in an author attribution evaluation methodology; a model that relie...

متن کامل

Entropy-Based Authorship Search in Large Document Collections

The purpose of authorship search is to identify documents written by a particular author in large document collections. Standard search engines match documents to queries based on topic, and are not applicable to authorship search. In this paper we propose an approach to authorship search based on information theory. We propose relative entropy of style markers for ranking, inspired by the lang...

متن کامل

The effect of author set size and data size in authorship attribution

Applications of authorship attribution ‘in the wild’ [Koppel, M., Schler, J., and Argamon, S. (2010). Authorship attribution in the wild. Language Resources and Evaluation. Advanced Access published January 12, 2010:10.1007/ s10579-009-9111-2], for instance in social networks, will likely involve large sets of candidate authors and only limited data per author. In this article, we present the r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012